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Lagrange optimality system for a class of 
nonsmooth convex optimization. 


Bangti Jin* Tomoya Takeuchi^ 


In this paper, we revisit the augmented Lagrangian method for a class of nonsmooth con¬ 
vex optimization. We present the Lagrange optimality system of the augmented Lagrangian 
associated with the problems, and establish its connections with the standard optimality 
condition and the saddle point condition of the augmented Lagrangian, which provides a 
powerful tool for developing numerical algorithms. We apply a linear Newton method to the 
Lagrange optimality system to obtain a novel algorithm applicable to a variety of nonsmooth 
convex optimization problems arising in practical applications. Under suitable conditions, 
we prove the nonsingularity of the Newton system and the local convergence of the algo¬ 
rithm. 
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1 Introduction 

In this paper we consider the augmented Lagrangian method for solving a class of nonsmooth convex 
optimization problems 

min f{x) + (j}{Ex), (1-1) 

xGX 

where the function f: X ^ M. is convex and continuously differentiable on a Banach space X, (j): H ^ 
K’*' is a proper, lower semi-continuous and convex function on a Hilbert space H, and U is a bounded 
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linear operator from X to H. We assume that the proximity operator of the convex function </> has a 
closed form expression. This problem class encompasses a wide range of optimization problems arising in 
practical applications, e.g., inverse problems, variational problems, image processing, signal processing 
and statistics to name a few [1, 2, 3, 4, 5, 6, 7]. 

The augmented Lagrangian method was proposed independently by Hestenes [8] and Powell [9] for 
solving nonlinear programming problems with equality constraints. The method was studied in relation 
to Fenchel duality and generalized to nonlinear programming problems with inequality constraints by 
Rockafellar [10, 11]. Later it was further generalized to the problem (1.1) by Glowinski and Marroco 
[12] where the augmented Lagrangian is given by 

Cc{x,v,X) = f{x) + (j){v) + {\Ex -v) + ^\\Ex - u|p. 

The inner product (A, Ex — v) dualizes the equality constraint, and the quadratic term penalizes the 
constraint violation for the following equality constrained problem equivalent to problem (1.1): 

min f{x) + 4>{v) subject to Ex = v. 

x£X,v£H 

A solution of problem (1.1) can be characterized, under certain conditions on /, (jj and E, as a saddle 
point of the augmented Lagrangian, and the strong duality theorem leads to first-order algorithms 
for the dual function 0{X) = iTdx,v E,c{x,v,X). In practical implementation, the combination of the 
dualization and the penalization alleviates the slow convergence for the ordinary Lagrangian methods 
and ill conditioning as c —>■ oo for penalty methods. Due to these advantages over the standard 
Lagrangian formulation and the penalty formulation, a large number of first order algorithms based 
on the augmented Lagrangian Cc have been developed for a wide variety of applications; see e.g., 
[1, 13, 14, 15]. 

An alternative Lagrangian for (1.1) has been introduced by Fortin [16], which was obtained by 
employing the partial conjugate of the augmented perturbation bifunction Ec{x,v) = f{x) + 4>{Ex — 
^) + f IkiP due to Rockafeller [10]: 

Lc{x, A) = min ((u. A) -I- Fc{x,v)) = min ((u. A) -I- f{x) + 4>{Ex - v) + 

= min ({Ex — u,X) + f{x) + (j){u) + ^\\Ex — 'ujp') 

= f{x) + min ((j){u) + (A, Ex — u) + -\\Ex — mIP) 

= f{x) + MEx + X/c) - ^ IIAf, 

where c is a positive constant and the function (j)c{z) is the Moreau envelope (see Section 2 for the 
definition). It was shown that a saddle point of L,. is also a saddle point of the standard Lagrangian 
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and conversely [16, Thm. 2.1]. A first order algorithm often referred to as the augmented Lagrangian 
algorithm, which is quite similar to the one developed in [12], was proposed for certain special cases 
of the function (j) [16, Thm. 4.1]. The augmented Lagrangian method was further studied by Ito and 
Kunisch [17] for the following optimization problem 

\mnf{x)+(j){Ex), (1.2) 

x£C 

where C is a convex set in X. One of their major achievements is the results concerning the existence 
of a Lagrange multiplier for problem (1.2): It was shown that under appropriate conditions Lagrange 
multipliers of a regularized problem defined by the augmented Lagrangian Lc converge and the limit 
is a Lagrange multiplier of problem (1.2). In addition to the valuable contribution, the augmented 
Lagrangian algorithm by Fortin was extended to a more general class of convex functions (j), and the 
convergence of the algorithm was established. It is noted that the problem can be reformulated into 
problem (1.1), by redefining the convex function <j) and the linear map E by (j){x,y) := (p{x) + xciu) 
and Ex := {Ex,x), respectively, where xc is the characteristic function of the convex set C. Hence, it 
shares an identical structure with problem (1.1). 

The augmented Lagrangian is Frechet differentiable, cf. Section 3, which motivates the use of the 
Lagrange optimality system 

Da;Lc(a;, A) = 0, and DxLc{x,\) = Q, (1.3) 

to characterize the saddle point and hence the solution of problem (1.1). This perspective naturally leads 
to the application of Newton methods for solving the nonlinear system. However, the Moreau envelope 
involved in (1.3), cf. Proposition 3.1, is twice continuously differentiable if and only if the same is true 
for the convex function (j) [18], and thus the standard (classical) Newton methods cannot be applied 
directly to the Lagrange optimality system. Semismooth Newton methods and quasi-Newton methods 
are possible alternatives for solving the Lagrange optimality system, but there are some drawbacks in 
their applications to the Lagrange optimality system: The inclusion appearing in the chain rule of a 
composite map makes it difficult to theoretically identify a generalized or limiting Jacobian of 
for semismooth Newton methods, while the superlinear convergence of quasi-Newton methods holds 
only when the system to be solved is differentiable at the solution [19]. We opt for instead linear 
Newton methods [20] to solve the Lagrange optimality system (1.3), where one replaces the generalized 
Jacobian of in semismooth Newton methods with a linear Newton approximation (LNA) of 

Dx,xLc- Calculus rules, which provide a systematic way of generating LNAs of a given map, reduce 
the construction of a LNA of the Lagrange optimality system to the computation of the (Clarke’s) 
generalized or limiting Jacobian of the proximity operator involved in the system, cf. Section 4. 
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The focus of this work is twofold. First, we present the Lagrange optimality system, which was 
not provided in both [16] and [17], and establish its connection with the standard optimality system 
of problem (1.1) and the saddle point condition of the augmented Lagrangian. Second, we develop a 
Newton type algorithm for the Lagrange optimality system. To the best of our knowledge, this is the 
first work using the Lagrange optimality system for developing Newton type algorithms for nonsmooth 
convex optimization (1.1). These two aspects represent the essential contributions of this work. 

The rest of the paper is organized as follows. In Section 2 we collect fundamental results on the Moreau 
envelope and the promixity operator, which provide the main tools for the analysis. In Section 3, we 
investigate the connection among the optimality system for the problem (1.1), the Lagrange optimality 
system and the saddle point of the augmented Lagrangian Lc- In Section 4, we develop a Newton 
method for problem (1.1), which exhibits a local Q-superlinear convergence. 

1.1 Notations 

We denote by X a real Banach space with the norm j • j. The duality bracket between the dual space 
X* and X is denoted by (•, ■)x*,x- For a twice continuously differentiable function /, its derivative is 
denoted by Df{x) or Dxf{x), and its Hessian by D‘^f{x). H is a Hilbert space with the inner product 
(•,•), and the norm on H is denoted by jj • jj. The set of proper, lower semicontinuous, convex functions 
defined on the Hilbert space H is denoted by Fo(iL). The effective domain of a function </> G Fo(iL) is 
denoted by 77(0) = £ H \ if){z) is finite}, and it is always assumed to be nonempty. For a function 

4> € Tq{H), the convex conjugate 0* is defined by 4>*{z*) = {{z*, z) — 4>{z)). A subgradient of 0 

at X G H is g € H satisfying 

(l>{y) > (l)ix) + {g,y - x), Wy G H. 

The subdijferentials of 0 at a; is the set of all subgradients of 0 at x, and is denoted by 00(x). 

2 Moreau envelope and proximity operator 

The central tools for analyzing the augmented Lagrangian approach are Moreau envelope and proximity 
operator. We recall their definitions and basic properties that are relevant to the development of the 
Lagrange multiplier theory. We note that for 0 G ro(i7) the strictly convex function u —0('u) + ^ IJu — 
z|p admits a unique minimizer. 

Definition 2.1. Let 0 G Fo(iL) and c > 0. The Moreau envelope 0c : H —?> K and the proximity 


4 


operator prox^ : H ^ H are defined respectively as 

4>c{z) = min (<^(u) + §\\u-zf), 
prox^(z) = argmin {(j){u) + \ \\u - ) , 

for z G iJ. 

By definition we have 

proxi(z) = arginin + |||u - z||^^ = argniin ((/)(u) + |||m - z|p) , 

and 

4ic{z) = 4>{prox±{z)) + f ||proxi(z) - z\\^. 

We refer interested readers to Tables 10.1 and 10.2 of [3] for closed-form expressions of a number of 
frequently used proximity operators. 

We recall well-known properties of the Moreau envelope and proximity operator. 

Proposition 2.1 ([21]). Let z € H and c > 0. Let (j) G Tq{H). 

(a) 0 < 4>{z) — 4>c{z) for all z € H and all c > 0. 

(b) limc_>oo (fciz) = 4>{z) for all z € H. 

(c) The proximity operator prox^ is nonexpansive, that is, 

c 

||prox^(z) — prox^(ty)|P < (prox^ (z) — prox^ (w), z — rr), 's/z,'slw € H. 

c c c c 

(d) The Moreau envelope (fc is Frechet differentiable and the gradient is given by 

Dz<l>c{z) = c{z — piOK^^z)), yc>0,yz€H. (2.1) 

(e) The gradient z —>■ D^ffciz) & H is Lipsehitz continuous with a Lipschitz eonstant c, i.e., 

\\Dz(j)c{z) — Dz(l>ciw)\\ < c||z — w\\, \/z,\fw G H. 

(f) The Moreau envelope and the proximity operator of the conjugate of (j) are related with tfc and 
prox0, respectively as 

c 

(j)a{z) + {(f*)k {cz) = f ||zf, prox|(z) -h iprox^^. (cz) = z. 

All the results are standard; The proofs can be found in e.g., [21]. Here we give an alternative proof 
of (f) based on the duality theory. 
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Proof. For z G H, we define the function Lz'. H x V{(j)) —>• R by 

Lz{u,p) := {u,p) - 4>ip) + j^\\u- czf. 

Clearly, Lz is convex in u and is concave in p. We claim that Lz posses a saddle point on H x 'D{(j)). 
Clearly, hm||^i||_j,oo Lz{u,p) = oo for all p G 'D{4>). Thus by [4, Chap. 6, Prop. 2.3], we have 

inf sup Lz(u,p) = sup inf Lj;(u,p). (2.2) 

Up p U 

Now we compute inf^ supp Lz(m,p) and supp inf„ separately. First, we observe 

inf supL^(w,p) = inf ( sup((u,p) - (j){p)) + ^||m - C 2 ;||^ 

Up “ \ p 

= inf {(j}*{u) + j^\\u- czW^) = {(j}*)i{cz). 

Meanwhile, we have 

inf L^(u,p) = inf (T||m - czf + {p,u)) - (j){p) = ^||cpf + {p,c{z-p)) - <l){p) 

U U 

= c{p,z) - fljpf - (j){p) = f llzf - (<))(p) + f||p-zf) . 

Thus, we deduce 

supinf L^('U,p) = sup (f || 2 :|p - {(j){p) + §\\p - zW^)) = f || 2 ;f - (fdz). 

p U p 

Therefore, from (2.2) we have 

{(j)*)i{cz) = inf supL 2 (M,p) = supinf L^(u,p) = ^\\z\\^ - cfciz), 

C Up p U 

which shows the first relation. Differentiating both side of this equation with respect to z and using 
(2.1) result in the second relation. □ 

The Moreau envelope and the proximity operator provide equivalent expressions of the inclusion 
A G d(j){z). 

Proposition 2.2. Let c > 0 be an arbitrary fixed constant and (j) G Tq{H). Then the following 
conditions are equivalent. 

(a) A G d4>{z). 

(b) z — prox^ (z + A/c) = 0. 

(c) ef{z)=Uz + ^lc)-lUf- 
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Proof. Let the pair (z, A) satisfy the condition A S d(j>{z). This can be expressed as 

0 e d(l>{z) + c{z - {z + A/c)) = du (</>(«) + |||u - (z + A/c)||^^ \u=z, 

which is equivalent to z = prox^(z + A/c). This shows the equivalence between (a) and (b). Next we 
show that (b) implies (c). Suppose z — prox^ (z + A/c) = 0. Then by the definition of ^c, it follows that 

c 

4>c{z + A/c) = (/(prox^(z + A/c)) + f ||prox^(z + A/c) - (z + A/c)|p 

c c 

= </(z) + §||z-(z + A/c)||^=</(z) + i||A||^. 

Finally, we show that (c) implies (a). By the definition of the Moreau envelope, it follows that 
(l)c{z + A/c) < ^(u) + |||m - (2 + A/c)||^, Vu e H, 
which is equivalently written as 

(j){z) = 4>c{z + A/c) - < 4>{u) + ^\\u- zW’^ + {u- z, -A), Vm G H. 

This implies that the strictly convex function u —> 4>(u) + |||u — z||^ + (u — z, —A) attains its minimum 
at z. Thus 

0 G {4>{u) + |||u - z|p + (m - z, -A)^ \u=z = d(j)(u) - A, 
which proves that (c) implies (a). □ 

3 The optimality systems 

In the classical optimization problem for a smooth cost function with equality constraints by smooth 
maps, it is well known that saddle points are characterized by Lagrange optimality system of the (stan¬ 
dard) Lagrangian associated with the optimization problem. In this section, we show that the augmented 
Lagrangian generalizes the classical result to the nonsmooth convex optimization problem (1.1). 

Proposition 3.1. Let c > 0, f be convex and continuously differentiable, and (j) G Tq{H). The 
augmented Lagrangian Lc satisfies the following properties. 

(a) Lc is finite for all x € X and for all X G H. 

(b) Lc is convex and continuously differentiable with respect to x, and is concave and continuously 
differentiable with respect to A. Further, for all (x,\) G X x H and for all c > 0, the gradients 
DxLc and D\Lc are written respectively as 

DxLc{x, A) = Dxf{x) cE^{Ex + A/c — proxi(£lx -b A/c)), (3.1) 

c 

D\Lc{x, A) = Ex — prox^ {Ex + A/c). (3-2) 
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(c) DxLc{x,\) can he expressed in terms of D\Lc{x, X) by 


DMx, X) = D,f{x) +E'^{X + cDxL,{x, A)). (3.3) 

Proof. All the assertions follow directly from the differentiability and convexity of /, and Proposition 2.1. 

□ 

Theorem 3.1. Let c > 0, f be convex and continuously differentiable, and (f € Tq{H). The following 
conditions on a pair (x, X) are equivalent. 

(a) (optimality system) A pair {x,X) G X x H satisfies the optimality system 

Dxf{x) + E^X = 0 and X € d4>{Ex). (3.4) 


(b) (Lagrange optimality system) A pair {x,X) € X x H satisfies the Lagrange optimality system 

DxLc{x,X)=0 and DxLc{x,X) = 0, (3-5) 

where the gradients of L^ with respect to x and X are given by (3.1) and (3.2), respectively. More 
precisely, {x, A) satisfies the nonlinear system: 

{ Dxf{x) + cE"^ (^Ex + A/c — prox^(Ax + A/c)^ = 0 
Ex — prox^ {Ex + A/c) = 0. 

(c) (saddle point) A pair {x,X) € X x H is a saddle point of Lc.' 

Lc{x, A) < Lc{x, A) < Lc{x, A), Vx G A, VA S H. (3.6) 


Proof. First we show the equivalence between (a) and (b). Suppose that (a) holds. The inclusion 
A G d(j){Ex) is equivalent to the equation Ex — prox±{Ex + A/c) = 0 by Propostion 2.2. Hence, 

c 

from (3.2) we have 

D\Lc{x, A) = Ex — prox^(£'x + A/c) = 0. 

c 


Thus 


DxL,{x, A) = Dxf{x) + (A + cDxL,{x, A)) = Dxf{x) + E'^X = 0, 

by Proposition 3.1(c). Similarly, we can show that (b) implies (a). 

Next we show the equivalence between (b) and (c). If (x. A) satisfies the Lagrange optimality system, 
then from the convexity of Lc{x, A) with respect to x, we have 

Lc{x, A) - Lc(x, A) > {DxLc{x, A), x - x)x*,x = 0 Vx G A. 
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Similarly, by the concavity of Lc{x, •), we deduce Lc{x, A) < Lc{x, A). 

Conversely, suppose that {x, A) is a saddle point. The second inequality indicates that a; is a min- 
imizer of the function Lc(’)A), which implies that DxLc{x,X) = 0. The similar argument shows that 
DxLc{x,X) = 0. □ 

Corollary 3.1. If one of the eonditions in Theorem 3.1 holds, then x is a solution of problem (1.1). 

Proof. Assume that there exists a pair (x, A) satisfying the optimality system (3.4). The system implies 
that 0 € Dxf{x) + E'^d(j){Ex). By [4, Chap. 1, Prop. 5.7]) we have 

E'^d(j){Ex) C d{(j} o E){x), Vx G X. 

Therefore it follows that 

0 G D;^f{x) + E'^dcj){Ex) C D,,f{x) + d[(j) o E)(x) = d{f + (j)o E){x), 

which shows that x is a solution of the minimization problem (1.1). □ 

Remark 3.2. We refer to [7, Chap. 4] for a sufficient condition for the existence of a pair satisfying 
the optimality system (3.4). 

Corollary 3.2. The Lagrange optimality system can also be written as 

Dxf{x)+E'^X = 0 and Ex —'piox^{Ex + X/c) = 0. (3.7) 

Proof. It follows directly from Proposition 3.1, (3.2) and (3.3). □ 

The Lagrange optimality system (3.5) is closely related to the optimality system derived in [17, 22] 
which is given by using the generalized Moreau-Yosida approximation il)c{z,X) defined by 

ipc{z, A) = (t>c{z + A/c) - ^llAJl^. 

Let us assume that a pair (x,A) G A' x Z satisfies the optimality system (3.4). It is shown in [17, 
Thm. 4.5] that the pair satisfies the following optimality condition for every c > 0. 

X = m.\-a.Lc(x, X) and A = [Dx'4’c){Ex, A). 

X 

The first relation implies the inequality L(.{x, A) < Lc{x, A) for all x G A, which is the second inequality 
of (3.6). Meanwhile, by the definition of tpc{x,X) and Proposition 2.1(d), we have 

{Dxi’c){Ex, X) = (t>ciEx + A/c) 

= c{Ex + A/c — prox^ {Ex + A/c)) 

= A + c{Ex — prox^ {Ex + A/c)). 
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In view of the expression (3.2), the second relation implies D\Lc{x, A) = 0, which is the second equation 
of the Lagrange optimality system (3.5). Alternatively, the following optimality condition in the form 
of equation is given in [22]: 

Da;f{x) + E^\ = 0 and X = {Dxtpc){Ex,X). 

Similarly, one can show that this optimality system is equivalent to (3.7). 

4 Linear Newton method for the Lagrange optimality system 

In this section, we present a linear Newton method for the nonsmooth optimization problem (1.1) on 
the basis of the Lagrange optimality system. We also illustrate the method on two elementary examples. 
To keep the presentation simple, we restrict our discussions to finite-dimensional spaces. 

4.1 Linear Newton method 

We begin with the concept of linear Newton approximation, which provides a building block for designing 
Newton type algorithms for problem (1.1). For a comprehensive treatment and for further references 
on the subject one may refer to [20]. 

Definition 4.1. Let $: R'" ^ R" be locally Lipschitz continuous. We say that the map $ admits a 
linear Newton approximation (LNA) at ^ G R"* if there exists a set-valued map T: R™ ]^raxm 
that: 

(a) The set of matrices T(^) is nonempty and compact for each ^ G R™; 

(b) T is upper semicontinuous at 

(c) The following limit holds: 

ii$(g) + ne-j)-^(oii 0 
IIC-CII 

veT(0 

We also say that T is a linear Newton approximation scheme of $. 

A linear Newton iteration for solving the nonlinear equation $(^) = 0 is defined by 

^fe+i ^ with Vk G T(^'=). (4.1) 

The local convergence of the iterate is ensured if the matrix 14 is nonsingular for all k. 
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Theorem 4.2 ([20, Thm. 7.5.15] ). Let $: M" — >■ R” be locally Lipschitz continuous and admit a LNA 
T at G R" such that $(^*) = 0. If every matrix V € is nonsingular, then the iterate (4.1) 

converges superlinearly to the solution provided that is sufficiently close to f*. 

In addition to the Newton iteration (4.1) we can also define inexact version of linear Newton methods, 
the Levenberg-Marquardt (LM) method and the inexact version of LM method, and establish their local 
convergence as well as characterize their convergence rate, see. e.g., [20]. The linear Newton method 
for the Lagrange optimality system, which we shall develop later in the section, can be extended for 
these methods along similar lines, but we restrict ourselves to the basic Newton method (4.1). 

To provide a class of Lipschitz maps that admit a LNA, we shall make use of the notion of generalized 
Jacobian and semismoothness. Let 4>: R™ —>■ R" be a locally Lipschitz continuous map. Rademacher’s 
Theorem [23, Sect. 3.1.2] states that a locally continuous map is differentiable almost everywhere. 
Denote by N,j, a set of measure zero such that 4> is differentiable on R™ \ iV$. The limiting Jacobian of 
$ at is the set 

5s$(e) := {G G R"''’" I C R™ \ with ^ G} . 

The (Clarke’s) generalized Jacobian d^{() of $ at ^ G R™ is the convex hull of the limiting Jacobian: 

5$(C) = conv(5B$(e))- 

We denote by the set valued map f —>■ 9b$(C) for f G R™. The set valued map 9<i> for the 
generalized Jacobian is defined analogously. 

A possible choice for a LNA scheme of a locally Lipschitz map is the limiting or generalized Jacobian 
of the map. This attempt, in the absence of additional assumption on $, is doomed because both of 
them do not necessarily satisfy the approximation property of condition (c) in Definition 4.1. This 
drawback can be ameliorated by employing the notion of semismoothness, which narrows down the 
class of Lipschitz maps so that each of 9$ and provides a LNA scheme of the map. 

Definition 4.3. Let 4): R™ — >■ R” be a locally Lipschitz map. We say that $ is semismooth at G R™ 
if $ is directionally differentiable near f and the following limit holds: 

Il<&'(e;^-e)-4>'(e;^-0ll „ 

where $'(,^; h) denotes the directional derivative of dJ at G R™ along the direction h G R™. 

Proposition 4.1. Assume that a locally Lipschitz map $: R™ R" is semismooth at f € R™, then 
each o/94> and 9_b4> defines a LNA scheme of ^ at 
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Proof. It follows from [20, Prop. 7.1.4] that the set valued map 9$ satishes the condition (a) and (b) 
of Definition 4.1, while, from [20, Thm. 7.4.3], the map satisfies the condition (c). We refer the proof 
for the limiting Jacobian to [20, Prop. 7.5.16]. □ 


4.2 Linear Newton method for the Lagrange optimality system 


We are ready to present a Newton algorithm for the Lagrange optimality system. Let the map 4>c 
^ be defined by 

71xLc(x, A) 

D\L(.{x A) 

Proposition 3.1 shows that the map is the difference of a smooth and nonsmooth part 


$c(x. A) = 


4>c(x, A) = $s(x. A) - A), 


where 

A) := 

The Jacobian of $s(a;, A) is 


Dxf{x) + cE"^E x E'^X 

and <I>„s(x, A) = 

cE'^piOK^ {Ex -I- A/c) 

■= 

Ex 


prox^ {Ex + A/c) 

- c 


71a:,A4‘s(a;, A) = 


Dlf{x)+cE^E 

E^ ’ 

E 

0 


and the (matrix valued) map Dx.x^s defines a LNA scheme of the smooth map at every point (x. A). 
By the sum rule (see, e.g., [20, Thm. 7.5.18]), a LNA scheme of <I>c is provide by T = — Tns 

where Tns is a LNA scheme of 4)„s. The next result shows that the task of determining Tns is reduced 
to the one of computing a LNA scheme of the proximity operator. 

Lemma 4.1. Let </> G ro(K"*) and c > 0. Let Tp be a LNA scheme of the proximity operator prox^. 
Then the set-valued map 


Tns{x,X) := 

is a LNA of the map . 


1 


G 


E c-^I 


G G Tp{Ex + A/c) } C 


Proof. Since Tp is upper semi-continuous and the set Tp(z) is compact by definition, so is the set-valued 
map (x, A) —?> Tns{x, A), which implies that the Tns satisfies the conditions (a) and (b) in Definition. 4.1. 
One can verify that the set valued map Tns satishes the condition (c) in the dehnition by employing 
the sum rule ([20, Thm. 7.5.18]) and the chain rule ([20, Thm. 7.5.17]). □ 
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We now turn our attention to define a possible LNA scheme of a proximity operator. By Proposi¬ 
tion 2.1, the proximity operator is nonexpansive, and therefore it is Lipschitz continuous. Hence the 
limiting Jacobian dB{woy^^/c){z) is well-defined for all ^ £ R™, and so also is the generalized Jacobian 
9(prox^/p)(z). The next result, due to [24, Thm. 3.2], gives the basic properties of the generalized 
Jacobian of the proximity operator. 

Proposition 4.2. For any (j) £ ro(R'"), every G £ 9(prox^)(2;) is a symmetric positive semidefinite 
matrix with jlGjl < 1. 


Now we can specify a LNA scheme of the map D^^xLc at (x. A). 


Proposition 4.3. Let (p £ ro(K™) and c > 0. Assume that the proximity operator prox^ is semismooth. 

c 

Then the set-valued map T: R" x R™ =| R(™+™)x(n+m) i^y 


Dlf{x)+cE'^{I-G)E 

((/-g)a)t ■ 

1 

1 

-c-iG 


Tix,X) := 

with z = Ex J- A/c, is a LNA scheme of the map <i>c at (x, A) £ R" x 


G £ 9(proxi)(z) 


(4.2) 


Proof. The symmetry of the generalized Jacobian of a proximity operator allows to write E'^G = {GE)"^ 
for G £ i9(prox^)(ii^a; J- A/c), which yields 


Dlf{x)+cE'^{L-G)E ((/-G)A)T 


Dlf{x) + cE'^E E'^ 


cE"^ 

{L-G)E -c-^G 


E \ 0 


I 


From Proposition 4.1 and the assumption that prox^ is semismooth, it follows that the generalized Jaco- 

c 

bian 9(prox« )(2) is a LNA scheme of the proximity operator prox^(z), which together with Lemma 4.1 

c c 

shows that Tns{x, A) with Tp{Ex J- A/c) = 9(prox^)(F^a; J- A/c) defines a LNA scheme of <i)„s at (cc. A). 

c 

Thus T = Ux.A'hs — Tns defines a LNA scheme of at (x. A). □ 


Remark 4.4. One can replace the generalized Jacobian 5(prox0 )(2;) in (4.2) with the limiting Jacobian 
(9B(proxi)(2;). 

Remark 4.5. The class of semismooth maps is broad enough to include a variety of proximity operators 
frequently encountered in practice, see, e.g., [24, Sect. 5]. 


The proposed algorithm is given in Algorithm 1. 
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Algorithm 1 Linear Newton algorithm for the Lagrange optimality system. 

1: Chose (a;°,A°) e R" x K™. 

2 : If = 0, stop. 

3: Let = Ex’^ + A^/c, and compute an element Gk of the generalized Jacobian cl(prox 0 )( 2 ;^). 
4: Compute a direction by 


Dlf{x>^) + cE^{I - Gk)E \ ((/ - Gfe)R)T 

- 1 

_ 1 


R,L,(x^A'=) 

(/ - Gu)E \ -c-iGfe 

d\ 


DxL.ix^X’^) 


5: Set =x^ + dl and A''+^ = A'' + 
6; Go back to Step 2. 


Remark 4.6. Proposition 4.1 allows to replace the generalized Jacobian 5(prox0 )( 2 ;) with the limiting 
Jacobian 9 b(pi'Ox^)(z). 

c 

Remark 4.7. A simple calculation using Theorem 3.1 shows that the update at Steps 4 and 5 can be 
replaced with 


Dlf{x^) 



^k+i 


Dlf{x^)x'^ - 

{I-Gk)E 

-c-^Gfc _ 


Afe+i 


proxi(z'') - Gkz’^ 


The local convergence of Algorithm 1 follows from Theorem 4.2, if every element of T{x^ A) defined 
by (4.2) is nonsingular. The next result gives one sufficient condition for the nonsingularity. 

Proposition 4.4. Assume that E is surjective, f(x) is strictly positive definite, and the norm is 
bound from below uniformly in x, that is, there exists a 6 > 0 such that 

{Dlf{x)d,d) > 6\\df VdeK". 

Then every element ofT(x,\) is nonsingular for all {x,X). 

Proof. A saddle point matrix of the form 

A RT 
B -C 

where A is symmetric positive definite and C is symmetric positive semidefinite, is nonsingular if 
ker(C') n ker)!?"^) = 0, see, e.g., [25, Thm. 3.1]. Note that Df.f{x) is symmetric positive definite by 
assumption, and G and I — G are symmetric positive semidefinite, cf. Proposition 4.2. Hence the matrix 
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D‘^f{x) + cE'^{I — G)E is symmetric positive definite. Now let d G ker(G') nker(((J — G')i?)'^). We then 
have 

Gd = Q and {{I -G)Efd = Q. 

Appealing again to the identity G* = G from Proposition 4.2, it immediately follows that E'^d = 0. 
Then the surjectivity of E implies d G ker(A'^) = Im(A)-*- =0. □ 

The local convergence of Algorithm 1 follows from Theorem 4.2, Propositions 4.3 and Proposition 4.4. 

Theorem 4.8. Let f be smooth, (j) G ro(R"*), and c > 0. Let us assume there exits a unique solution 
(x. A) of the Lagrange optimality system (3.5). We also assume that the assumptions on f and E in 
Proposition 4-4 are satisfied, and that the proximity operator is semismooth on R™. Then the Newton 
system (4.3) is solvable, and the sequenee generated by Algorithm 1 converges to the solution 

(x, A) superlinearly in a neighborhood of {x,X). 

4.3 Examples 

We illustrate Algorithm 1 on two examples: bilateral constraints and penalty. We begin with a useful 
result for computing the generalized (limiting) Jacobian for (block) separable functions [24, Prop. 3.3]. 
Let (mi,..., miq) be an N partition of m, i.e., ^ t)e decomposed into N blocks 

of variables with Zi G R'"*. The function (f G ro(R'") is said to be (block) separable if (j){z) = 
for N functions cfi G ro(R'"). 

Proposition 4.5. If (f £ ro(R™) is (block) separable then every element of the generalized Jacobian 
9(prox0)(x) is also a (block) diagonal matrix. 

Example 4.9. Let us consider the following optimization problem with bilateral inequality constraints 

min f(x) subject to a < Ex < b, 

where / is a smooth function, a,b £ R'" and E £ R'"^". 

The problem can be reformulated into (1.1) with (j){z) = Is{z), where Is{z) is the characteristic 
function of the set S = {z £ R"* | aj < zj < bj, j = l,...,m}. Clearly, the proximity operator 
prox^ : R™ —>■ R'" is given by 

c 

T 

prox^( 2 :) = [max(ai, min(6i, zi)),... ,max(am, min(6m, Zm))] ■ 
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Since the proximity operator is separable, a limiting Jacobian G € Bb prox^ (z) is diagonal matrix by 

c 

Proposition 4.5: 

1 if ttj < Zj < bj, 

{ 0 , 1 } if Zj e{aj,bj}, 

0 otherwise. 

Now let (x. A) be the current iterate, and z = Ex + X/c. We denote by o the index set {j \ Gjj = 0} C 
(1,2,...,m}, and by i its complement. Then ino = 0 and iUo = (1,2,...,mj. We shall denote by Xo 
the subvector of x, consisting of entries of x whose indices are listed in o. The submatrix of E denoted 
by Eo is defined analogously. For example, if o = {oi, 02 ,..., Oi} where i is the number of elements of 
the set o, then Xo is x 1 column vector, and Ao is i x n matrix given respectively by 




Xoi 


^01,1 

Aoi,2 ■ 

4 

-^oi ,n 

Xq - 

X 02 

and Ao = 

-^02,1 

Ao 2,2 ■ 

4 

■^02 ,Tl 





Ao ^,2 

4 

,n 


With the new updates denoted by x+ and A+, the Newton update (4.4) yields 


Xf = c{z — prox^ 

(z)) 



1 

Dlfix) 


X~^ 


Dlfix)x - D^f{x) - E^Xf 


1 

0 

_1 




prox±{z)o 

. c - 


In this example, we have Zj = prox^(z)i, and the Newton update is further simplified as 


Elf{x) 

El 

X'^ 


Dlf[x)x - D^f{x) 

Eo 

0 

K_ 


prox^(z)o 

. c - 


and 


Xf = 0 . 


In particular if / is a quadratic function /(x) = ^(x, Ax) — (5, x), the algorithm reduces to the primal- 
dual active set algorithm developed in [7, 26]: 


A 

Ej 


1 

+ 

1 _ 


b 

Eo 

0 


1 

+ 0 


proxi (z)o 

_ c _ 


and 


AA =0. 


Example 4.10. Consider the following type optimization problem 


min /(x) -I- a\Ex\ii , 

where / is smooth function, E G jzj^i is the norm, and a > 0 is a regularization parameter. 


Let (/>(z) = ajzj^i. Its proximity operator prox^ is the well known soft-thresholding operator 

c 

prox|(z) = [prox^|.|(zi),..., prox,||.|(Zm)]’’", 
proxo.|.|(s) = max(s — min(s -I- f, 0)), s G K. 
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A limiting Jacobian G G 9b(pi'0x^)(z) is diagonal matrix given by 


1 

{0,1} if|z,| = f, 

0 othewise. 

We denote by o the index set {} | Gjj = 0} C {1, 2,..., mj, and by i its complement, and 2 = Ex + X/c. 
We note that the relation c{z — prox0(z))i = c sign(2;i) holds. An argument similar to Example 4.9 

c 

yields the following Newton update 

/ 

= c sign(2;i). 


'Dlf{x) Aj 


x~^ 


Dlf{x)x - D^f{x) - E^Xf 

1- 

o 

1_ 


A+_ 


pTOX±{z)o 

- c _ 



For the quadratic function / = ^(x,Ax) — {b,x), we obtain a primal-dual active set algorithm for 
norm regularization 

/ 

Xf =csign(2;i). 


A 

El 


X~^ 



Eo 

0 


A+_ 


prox^ (z)o 


5 Conclusion 

In this paper, we have developed the classical Lagrange multiplier approach to a class of nonsmooth 
convex optimization problems arising in various application domains. We presented the Lagrange 
optimality system, and established the equivalence among the Lagrange optimality system, the standard 
optimality condition and the saddle point condition of the augmented Lagrangian. The Lagrange 
optimality system was used to derive a novel Newton algorithm. We proved the nonsingularity of the 
Newton system and established the local convergence of the algorithm. 

In order to make the proposed Newton algorithm applicable to real word applications, a further study 
is needed on several important issues including: to construct a merit function for the globalization of 
the algorithm; to develop efficient solvers for the (possibly) large linear system (Newton update); to 
provide a stopping criterion, and to report the numerical performance of the algorithm. These issues 
will be investigated in future work. 
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